Assessment of TVT and STS Risk Score Performances in Patients Undergoing Transcatheter Aortic Valve Replacement

Background The Society of Thoracic Surgeons (STS) score has been used to risk stratify patients undergoing transcatheter aortic valve replacement (TAVR). The Transcatheter Valve Therapy (TVT) score was developed to predict in-hospital mortality in high/prohibitive-risk patients. Its performance in low and intermediate-risk patients is unknown. We sought to compare TVT and STS scores’ ability to predict clinical outcomes in all-surgical-risk patients undergoing TAVR. Methods Consecutive patients undergoing TAVR from 2012-2020 within a large health care system were retrospectively reviewed and stratified by STS risk score. Predictive abilities of TVT and STS scores were compared using observed-to-expected mortality ratios (O:E) and area under the receiver operating characteristics curves (AUCs) for 30-day and 1-year mortality. Results We assessed a total of 3270 patients (mean age 79 ± 9 years, 45% female), including 191 (5.8%) low-risk, 1093 (33.4%) intermediate-risk, 1584 (48.4%) high-risk, and 402 (5.8%) inoperable. Mean TVT and STS scores were 3.5% ± 2.0% and 6.1% ± 4.3%, respectively. Observed 30-day and 1-year mortality were 2.8% (92/3270; O:E TVT 0.8 ± 0.16 vs STS 0.46 ± 0.09), and 13.2% (432/3270), respectively. In the all-comers population, both TVT and STS risk scores showed poor prediction of 30-day (AUC: TVT 0.68 [0.62-0.74] vs STS 0.64 [0.58-0.70]), and 1-year (AUC: TVT 0.65 [0.62-0.58] vs STS 0.65 [0.62-0.58]) mortality. After stratifying by surgical risk, discrimination of the TVT and STS scores remained poor in all categories at 30 days and 1 year. Conclusions An updated TAVR risk score with improved predictive ability across all-surgical-risk categories should be developed based on a larger national registry.


Introduction
2][3] Patients undergoing aortic valve replacement for severe AS often have multiple comorbidities, which increases perioperative morbidity and mortality and reduces late survival. 4The PARTNER trials demonstrated a significant reduction in mortality among high-risk patients undergoing TAVR compared with SAVR.However, overall mortality after TAVR in this particular group of patients remains relatively high. 5In contrast, patients with AS who were at intermediate or low surgical risk did not demonstrate significant differences in long-term mortality after TAVR compared with SAVR. 6,7Therefore, accurate risk assessment and predictive tools are needed to inform physicians appropriately, counsel patients, and optimize the allocation of health care resources.
In the absence of a specific risk score for stratifying TAVR patients, the Society of Thoracic Surgeons (STS) Predicted Risk of Mortality (STS-PROM) score and the overall judgment of the site heart team were routinely used in clinical trials to classify TAVR patients with symptomatic severe AS.Moreover, The American Heart Association/ American College of Cardiology (ACC) guidelines currently recommends assessing TAVR perioperative risk using only the STS-PROM score. 1 However, when used to predict mortality after TAVR, this score has been shown to overestimate 30-day mortality.This is likely because the STS-PROM score was designed to predict outcomes among surgical patients; therefore, extrapolation to patients with TAVR remains challenging, and its appropriateness is controversial.
The STS/ACC Transcatheter Valve Therapy (TVT) Registry was created in 2011 to capture patient characteristics, procedural variables, and outcomes to report baseline benchmarks for performance of TAVR in the United States. 80][11][12] However, TAVR was only commercially available to patients considered to be inoperable or at high surgical risk before 2016, and thus, the population from which the TVT score was derived did not include low and intermediate surgical risk patients undergoing TAVR.Although the TVT score has been validated in patients with high in-hospital mortality risk as predicted by STS-PROM, 13 this score has not been validated for short-term outcomes in lower surgical risk populations nor for long-term outcomes in any risk population.
The goal of this study was to assess the predictive ability of the TVT risk model for short-and long-term mortality compared with the STS-PROM in patients of all-risk profiles undergoing TAVR.

Methods
Consecutive patients undergoing TAVR between January 2012 and October 2020 within the Baylor Scott & White Health system were identified and analyzed.There were 4 participating hospitals performing TAVR within this large collaborative system.Center experience ranged from >10 years to <2 years.Patients were evaluated by an institutional multidisciplinary heart team consisting of cardiothoracic surgeons and cardiologists for eligibility for aortic valve intervention, including TAVR.Patients underwent formal preoperative evaluation and imaging through a TAVR clinic according to best practice guidelines established in our system.One-year follow-up information on survival status was collected using a previously validated protocol utilizing electronic medical record-confirmed contact with our health care system, patient phone calls, and online obituary searches. 14Patients were stratified into surgical risk groups according to the 2008 STS risk model 15 (for cases from 2012 to 2017) and 2018 STS risk model 16 (for cases from 2018 to 2020), which were defined as low-risk (STS-PROM <4%), intermediate-risk (STS-PROM !4%), high-risk (STS-PROM !8%), and inoperable (STS-PROM >15%) groups.
Continuous variables are presented as mean AE standard deviation or median (interquartile range) as appropriate and categorical variables as proportions, unless otherwise specified.The predictive ability of TVT score and STS-PROM in predicting mortality at 30 days and 1 year was evaluated using area under the receiver operating characteristics curve (AUC) (Figure 1).While acknowledging that the STS score was initially designed to predict 30-day outcomes, we wanted to assess whether it could also predict 1-year all-cause mortality.An AUC >0.7 was considered acceptable predictive ability. 17Stratified bootstrapping technique was used to compute 95% confidence intervals with 2000 bootstrap replicates.In addition, the performance of the existing risk scores was evaluated using the observed-to-expected ratios (O:E) in predicting both short-and long-term mortality.The analysis was performed on both the all-comers study population and after stratification into surgical risk groups by STS-PROM as defined previously.All inferential and descriptive analyses were performed at the .05significance level for a 2-sided test by using R version 4.0.0 (R Foundation for Statistical Computing).
The review board and ethics committee at Baylor The Heart Hospital -Plano approved this study.Given the retrospective nature of the study, informed consent was not required.1).As expected, mean TVT risk score increased with STS-PROM based surgical risk.

A
Overall mortality was 2.8% (92/3270) at 30 days and 13.2% (432/3270) at 1 year.Observed mortality within each risk category as well as the observed-to-expected (O:E) ratios based on the TVTrisk score and STS-PROM are detailed in Tables 2 and 3, respectively.Overall, the observed short-term mortality was closer to the expected mortality based on TVTrisk score.Both scores overestimated the mortality risk with an exception of the TVT risk score at 30 days in patients with inoperable risk (O:E 1.24 AE 0.5).At 1 year, both risk scores grossly underestimated the mortality risk.The mortality rate at each time point increased with the severity of the surgical risk categorization.Both TVT score and STS-PROM performed poorly at 30 days and 1 year for all-risk patients and when subjected to surgical risk stratification (Table 4).After risk stratification, the TVT score performed better than the STS-PROM at both 30 days and 1 year in high-risk and inoperative patients only (Table 4).

Discussion
Advances in the technique of aortic valve replacement and the periprocedural phases of care have reduced operative risk to low levels, even among high-risk patients.However, some fatal complications remain unpredictable and unavoidable.The accuracy of risk-stratifying tools and clinical prediction models used to guide clinical decisionmaking and interventions should be validated before broad clinical implementation. 18In the absence of appropriate paradigms for risk stratification, calculation of the STS score remains the recommended method for assessing perioperative risk in TAVR.However, the extension of its use to predict clinical outcomes among the cohort of TAVR patients is controversial.Indeed, the derivation of the STS score was based on a dataset composed only of surgical patients and was therefore not designed to predict mortality in patients undergoing TAVR.The STS/ACC TVT risk model was developed to predict in-hospital mortality after TAVR.Notably, the TVT risk score was developed at a time when TAVR patients were, predominantly, at high or prohibitive surgical risk. 13Since then, the use of TAVR has expanded to include patients across the full spectrum of surgical risk.
The TVT registry captures patient data specific to TAVR patients, so it is pertinent to evaluate the predictive performance of the TVT risk model against STS-PROM in patients of all-risk categories.It is also Receiver operating characteristic curve for TVT and STS-PROM scores.The area under receiver operator characteristics curves was utilized to determine predictive ability of the TVT and STS risk scores across all-risk categories.It demonstrated in short term, the TVT risk score had a higher predictive ability in patients of high and inoperable risk.However, both risk scores had poor-predictive performance across all-risk categories and time point.PROM, predicted risk of mortality; STS, Society of Thoracic Surgeons; TVT, transcatheter valve therapy.essential to understand the predictive ability of these scores beyond the index hospital stay or first 30 days after TAVR.In this study, we found that the TVT risk model was not a more accurate predictor of mortality than the STS-PROM in low and intermediate-risk patients, although it did have improved predictive ability in high-risk and inoperable patients.However, the predictive performance of both TVT and STS risk scores were suboptimal in patients of all-risk categories and time points overall (Central Illustration).

Performance of TVT and STS risk score in all-comers population
In the era of increasing TAVR procedures, TAVR-specific risk scores derived from multiple databases have been developed worldwide.In our study, the predictive ability of the TVT risk score was essentially equivalent to that of the STS risk score within the all-comers population in the short term.Both scores overestimated mortality risk based on our observed outcomes.Other studies have yielded differing results.An analysis assessing the performance of surgical risk scores (Euroscore II and STS) and TAVR risk scores (German Aortic Valve score, FRANCE-2 score, OBSERVANT model, and the TVT risk model) against the United Kingdom Transcatheter Aortic Valve Implantation (UK TAVI) registry found that the TVT risk score had a better predictive performance at 30 days compared with the surgically derived scores.In addition, this study showed that TAVR-specific scores had higher discrimination than surgical risk scores. 19Even so, the predictive performance of all-risk categories in this study was marginally good (AUC <0.7).An improved risk score for patients of all-risk categories undergoing TAVR is therefore necessary.
Despite some hesitant optimism for the performance of the TVT score up to 30 days, the applicability of the TVT risk model beyond the short term has not been widely evaluated.Our long-term findings suggest that both risk scoring systems have poor-predictive power.Therefore, to guide patient selection, newer prediction models with better discrimination and calibration tools should be developed, not only for short-term but also for longer-term follow-up.

Performance of TVT and STS risk score in high-risk patients
1][22] The TVT score was derived from high-risk patients undergoing TAVR and was designed to predict in-hospital mortality, 13 whereas the STS score was derived from all-comers data for patients undergoing SAVR.Compared with the STS score, the TVT risk model includes more limited and objective parameters for calculating risk of mortality compared with the STS-PROM to minimize interobserver variability and potentially improve predictive ability. 23It is not surprising, therefore, that our data show that in the short term, the TVT score outperformed the STS score in high-risk patients, with more favorable observed-to-expected ratios.This finding is consistent with previous validation studies. 9Additionally, we found that the TVT score moderately outperformed the STS-PROM up to 1 year from TAVR in patients of high or inoperable risk.
The TVT and STS-PROM scores do not collect quality-of-life data that can serve as a general index of procedural benefit, especially among high-risk patients.An example of this could be The Kansas City Cardiomyopathy Questionnaire, assessed at 30 days and 1 year after the procedure.Therefore, linking this kind of information to the wealth of clinical data in the TVT and STS scores will allow models to be built to predict the probability of TAVR benefit.For example, a patient may have a good prediction of TAVR mortality but a low probability of procedural benefit.Information of this type should be considered when designing new risk models and should serve as powerful tools in selecting appropriate patients.

Performance of TVT and STS-PROM in low-and intermediate-risk patients
The applicability of the TVT score in low-and intermediate-risk patients undergoing TAVR has not been well studied.Our findings show that the STS score performs poorly (AUC <0.6) in the intermediate-risk population.The observed mortality was much lower than that predicted by TVT and STS scores.Our findings are consistent with previously published studies demonstrating that STS-PROM overpredicts postoperative mortality in intermediate-and low-risk patients. 24,25In addition, the TVT score did not have greater predictive power than STS-PROM in either the short or long term.This suggests that there is currently no adequate predictive model for intermediate-risk patients undergoing TAVR.Regarding the low-risk population, the small percentage (5.8%) of low-risk patients in our study makes it difficult to draw firm conclusions for this patient group, underscoring the need for future studies with a larger sample of low-risk patients to validate or refute existing risk scores.

Future direction of prediction models for short-and long-term mortality
Although the higher predictive ability of the TVT risk model compared with the STS-PROM in high-risk patients is reinforced in our study, the current risk models developed from TAVR-specific registries  are suboptimal across all-risk groups.In fact, both risk scores have obvious limitations; patients can have a very high operative mortality risk and still have low scores.There are several comorbidities not captured in the STS and TVT scores, such as degree of vascular calcification, porcelain aorta, chest wall abnormalities, frailty, immunosuppression, cirrhosis, and others.Frailty is particularly difficult to quantify.External validation studies using the Israeli TAVI registry, 26 and Swiss TAVI registry 12 (C-statistics were 0.66 and 0.68 for the TVT risk model, respectively) have repeatedly demonstrated, at best, moderate predictive performance of the TVT score.Attempts to update the TVT risk model based on contemporary patient data to account for calibration shift have not been largely successful (AUC 0.6 to 0.63). 27Given the increasingly widespread use of the TAVR procedure and the ability to capture patient and procedural details within the TVT registry, the development of an improved risk score is needed.Data from the TVT registry on a national level is needed to recalibrate the existing model or to create an entirely new risk score that is more accurate in the short and long term, and applies to patients of all-risk categories who are candidates for TAVR.

Limitations
This study is subject to all limitations present in retrospective, observational studies.This information was pooled from a large regional health care system, therefore, the generalizability to a larger TAVR population with differing operator experience and beyond our geographical region is uncertain.Another limitation is the variability in cohort size of patients within the different risk categories.There were significantly fewer patients in the low-risk category, and lesser events (mortalities); therefore, it is difficult to make strong conclusions based on the wide confidence intervals in our resulting metrics.This is not surprising, however, as the majority of patients undergoing TAVR will fall into higher risk categories given the timeframe of the study.Future studies with a larger sample size of low-risk patients should be performed to validate the existing risk scores for both long-and short-term mortality.

Conclusion
This study demonstrates that the predictive ability of the TVT and STS risk scores for TAVR is suboptimal in the short and long term.Comparatively, the TVT risk score is more predictive than the STS score only in patients with high surgical risk and inoperable patients.In patients with low and intermediate risk, both scores are poor discriminators of 30-day and 1-year mortality.Further studies on a national level should be conducted to develop a TAVR-specific risk model that applies to all-risk categories to optimize patient selection and improve clinical outcomes.Until such a validated TAVR risk score is developed, clinical judgment by the on-site heart team should be combined with conventional scores to guide clinical decision making.

Declaration of competing interest
Karim Al-Azizi is a proctor/consultant to Edwards LifeSciences, a consultant/advisory board member for Medtronic, a consultant for Boston Scientific, and a speaker for Philips.Robert Stoler is a proctor and on the advisory boards of Medtronic, Boston Scientific, and Edwards Life-Sciences, and is on the advisory board of Biotronik.Srinivasa Potluri is on the advisory board and is a proctor and speaker for Medtronic, Boston Scientific, Abbott, and Cordis, and is a proctor and speaker for Edwards LifeSciences, Terumo, and AstraZeneca.Molly Szerlip is a proctor for Edwards LifeSciences and Abbott Vascular, a speaker for Edwards Life-Sciences and Boston Scientific, a consultant for Edwards LifeSciences, Abbott Vascular, and Boston Scientific, and serves on the advisory board of Abbott Vascular and the steering committee of Medtronic.Michael Mack is an uncompensated trial co-principal investigator for Abbott and Edwards LifeSciences and a trial study chair for Medtronic.The remaining authors report no competing interests.

Funding sources
Data acquisition efforts of this study were funded by the Cardiovascular Research Review Committee (CVRRC) of the Baylor Health Central Illustration.Performance of the STS and TVT score in assessing risk among patients undergoing TAVR.

Figure 1 .
Figure 1.Receiver operating characteristic curve for TVT and STS-PROM scores.The area under receiver operator characteristics curves was utilized to determine predictive ability of the TVT and STS risk scores across all-risk categories.It demonstrated in short term, the TVT risk score had a higher predictive ability in patients of high and inoperable risk.However, both risk scores had poor-predictive performance across all-risk categories and time point.PROM, predicted risk of mortality; STS, Society of Thoracic Surgeons; TVT, transcatheter valve therapy.

Table 1 .
Patient characteristics and outcomes.Values are presented as number (%) or mean AE standard deviation.Demographics and comorbidities characterizing patients undergoing TAVR and stratified by surgical risk based on the calculated STS-PROM.Most patients fell into the high-risk surgical category given the timeframe of the study.BMI, body mass index; CABG, coronary artery bypass grafting; ICD, implantable cardiac defibrillator; NYHA, New York Heart Association; PAD, peripheral artery disease; PCI, percutaneous coronary intervention; PPM, permanent pacemaker; STS-PROM, Society of Thoracic Surgeons Predicted Risk of Mortality; TAVR, transcatheter aortic valve replacement; TVT, transcatheter valve therapy.

Table 2 .
Outcomes and predictability of the TVT risk score.
Values are presented as n (%).Observed-to-expected mortality [O:E] presented as ratio AE standard error.The O:E ratio is calculated based off the TVT risk score for inhospital mortality across each surgical risk category.E, expected; O, observed; TVT, transcatheter valve therapy.

Table 4 .
Predictability of TVT and STS scores stratified by risk category.

Table 3 .
Outcomes and predictability of the STS risk score.Values are presented as n (%).Observed-to-expected mortality [O:E] presented as ratio AE standard error.The O:E ratio is calculated based off the STS risk score for 30-day mortality across each surgical risk category.E, expected.O, observed; STS, Society of Thoracic Surgeons.